Method and system for computer-assisted test construction performing specification matching during test item selection

ABSTRACT

A method and system for constructing a test using a computer system that performs specification matching during the test creation process is disclosed. A test developer determines one or more test item databases from which to select test items. The test item databases are organized based on psychometric and/or content specifications. The developer can examine the textual passages, artwork or statistical information pertaining to a test item before selecting it by clicking on a designation of the test item in a database. The developer can then add the test item to a list of test items for the test. The test development system updates pre-designated psychometric and content specification information as the developer adds each test item to the test. The test developer can use the specification information to determine whether to add to, subtract from, or modify the list of test items selected for the test.

FIELD OF THE INVENTION

The present invention generally relates to the field of testconstruction. The present invention particularly relates to a method andsystem for constructing a test using a computer system. Specifically,the present invention relates to a method and system for constructing atest using a computer system that performs specification matching duringthe test creation process.

BACKGROUND OF THE INVENTION

Testing services administer a variety of standardized tests. Forexample, the Graduate Management Admission Test® (GMAT®) evaluatesgraduate business school applicants by measuring general verbal,mathematical, and analytical writing skills. The Graduate RecordExaminations® (GRE®) assists graduate schools and departments ingraduate admissions activities. Tests offered include the General Test,which measures developed verbal, quantitative, and analytical abilities,and the Subject Tests, which measure achievement in 14 different fieldsof study. The Scholastic Assessment Test® (SAT®) Program includes theSAT I: Reasoning Test and SAT II: Subject Tests. The SAT I is athree-hour test, primarily multiple-choice, that measures verbal andmathematical reasoning abilities. The SAT II: Subject Tests areone-hour, mostly multiple-choice, tests in specific subjects. Thesetests measure knowledge of particular subjects and the ability to applythat knowledge. Colleges and universities typically use the SAT® Programas a factor in determining admission or placement of prospectivestudents. Individual states also administer tests to determine whetherand to what extent students meet state standards for educationalachievement.

Many tests, such as the above-mentioned tests, are offered multipletimes during a year and/or are administered over multiple years. It isimportant, in the case of tests that are offered multiple times during ayear, that the different administrations of each test be approximatelyequal in difficulty in order to properly rate examinees from differenttesting dates against one another. For tests that are administered overmultiple years, it is important that each test be of a known difficultylevel to accurately assess an examinee's performance and progress.Moreover, it may be important to evaluate other psychometricspecifications and statistical properties for a given test prior to itsadministration.

Some current methods for constructing tests, including those using acomputer interface, permit a test developer to view and select testitems. Other methods can display a match between content specificationsand the content properties of the selected test items. For example, suchtest construction systems typically keep track of metrics such as thenumber of questions that test a particular subject. On the SAT I, forexample, questions are divided into mathematics and verbal questions.Additionally, the test construction system could also keep track of thenumber of questions that are devoted to a sub-topic (such as geometry oralgebra) or that are presented in a certain format (such as an analogycompletion, sentence completion or word problem). By identifying thenumber of questions of a particular type included in the developed test,the test developer may be alerted if an incorrect number of questions oran incorrect number of questions of a particular type are included inthe test.

However, systems implementing these methods do not combine all of thefeatures listed above to permit the test developer to develop tests morequickly, while at the same time including the ability to determine ifthe selected test items meet psychometric specifications for a test andalso permitting a test developer to examine content or psychometricspecifications during the test development process so that the testdeveloper can add, remove or replace test items to adjust fordeficiencies with respect to test specifications during the test itemselection process.

Thus, a need exists for an evaluation tool that determines whetherdefined content and psychometric specifications for a test are met by aparticular question set.

A further need exists for providing psychometric and statisticalinformation to a test creator during the test creation process to permitevaluation and adjustment of the selected test items during the testcreation process.

SUMMARY OF PREFERRED EMBODIMENTS

Before the present methods, systems, and materials are described, it isto be understood that this invention is not limited to the particularmethodologies, systems and materials described, as these may vary. It isalso to be understood that the terminology used in the description isfor the purpose of describing the particular versions or embodimentsonly, and is not intended to limit the scope of the present inventionwhich will be limited only by the appended claims.

It must also be noted that as used herein and in the appended claims,the singular forms “a,” “an,” and “the” include plural references unlessthe context clearly dictates otherwise. Thus, for example, reference toa “test item” is a reference to one or more test items and equivalentsthereof known to those skilled in the art, and so forth. Unless definedotherwise, all technical and scientific terms used herein have the samemeanings as commonly understood by one of ordinary skill in the art.Although any methods, materials, and devices similar or equivalent tothose described herein can be used in the practice or testing ofembodiments of the present invention, the preferred methods, materials,and devices are now described. All publications mentioned herein areincorporated by reference. Nothing herein is to be construed as anadmission that the invention is not entitled to antedate such disclosureby virtue of prior invention.

The present invention includes a database of test items and informationregarding test items and tests. The information may include the contentstructure and statistical properties of test items and the content andpsychometric specifications for tests to be constructed from thedatabase. Psychometric specifications include specifications related tothe measurement of human characteristics. Psychometric specificationsmay be used to develop tests in areas such as intelligence testing,personality testing and vocational testing.

The present invention may further include a systematic procedure forselecting items for a test using a computer system connected to thedatabase described above. As the test is being constructed, the extentto which the developing test matches the content and psychometricspecifications for the test may be displayed so that a developer mayadjust the set of items selected to best match those specifications.

In a preferred embodiment, a method of constructing a test includesselecting a test item for inclusion in a set of selected test items,updating at least one evaluation statistic based on the selected testitem, and revising the set of selected test items to substantiallycorrelate the at least one evaluation statistic with at least onespecification for a test. The test item may be selected at least in partbased on a subject matter for the test item. The at least one evaluationstatistic may be selected from content specifications and psychometricspecifications. The at least one specification may also be selected fromcontent specifications and psychometric specifications. The contentspecifications may include a number of test items to be presented ineach of one or more pre-determined formats, a total number of test itemsto be included in the set of selected test items, a number of test itemsfor testing each of one or more pre-determined subject matters, a keydistribution, a percentage of test items having one or more pre-definedcharacteristics, a gender or racial orientation of test items, and alanguage in which the test items are presented. The psychometricspecifications may include an overall test difficulty rating, acorrelation between a correct response for a selected test item and aparticular cognitive or behavioral trait, an orientation of thepresentation of questions and answers for the set of selected testitems, a number of pages of text for a test, a mean point-biserial, amean r-biserial, and an arrangement of the set of selected test items.

In a preferred embodiment, a method for constructing a test includesselecting a portion of a test item database from which to select a setof test items for a test having one or more test specifications,displaying information concerning a plurality of test items in theselected portion of the test item database, examining a test item on adisplay device, selecting the test item for the test, and updating avalue for at least one test specification based on specified propertiesfor the selected test item. Selecting a portion of a test item databasemay be based on the subject matter of the test items contained withinthe portion of the test item database. Examining a test item may includeviewing an image of the test item, statistical properties of the testitem, text passages associated with the test item, an answer key,detailed content specifications, reviewers' comments, scoringguidelines, and artwork associated with the test item. The statisticalproperties may include one or more of a percentage of correct responsesfor the test item, t-biserials, r-biserials, item response theoryparameters, gender-based response statistics, race-based responsestatistics, a percentage of responses choosing each distractor, and afrequency of previous usage for the test item. Updating a value for atleast one test specification may be performed using item responsetheory. In an embodiment, the method further includes comparing currentvalues for the one or more test specifications with required values forthe one or more test specifications. In an embodiment, the methodfurther includes replacing one or more test items in the set of selectedtest items based on the one or more updated specifications. In anembodiment, the method further includes adding one or more test items tothe set of selected test items based on the one or more updatedspecifications. In an embodiment, the method further includes removingone or more test items from the set of selected test items based on theone or more updated specifications.

In a preferred embodiment, a system for constructing a test includes aprocessor, a computer-readable medium operably connected to theprocessor, and a display. The computer-readable medium contains one ormore databases each having a plurality of test items. Each test itemincludes a textual question and one or more answers for the test item, acontent structure of the test item, and one or more statisticalproperties for the test item. The statistical properties may include apercentage of correct responses for the test item, t-biserials,r-biserials, item response theory parameters, gender-based responsestatistics, race-based response statistics, a percentage of responseschoosing each distractor, and a frequency of previous usage for the testitem. The computer-readable medium may further include contentspecifications for a test, and psychometric specifications for the test.In an embodiment, the processor evaluates the content specifications andpsychometric specifications for a test while the test is being createdand determines a correlation value between the properties of theplurality of test items for the test and the content specifications andpsychometric specifications for the test. The display displays thecorrelation value to the test developer. In an embodiment, thecomputer-readable medium further contains instructions for performing amethod of constructing a test including selecting a test item forinclusion in a set of selected test items, updating at least oneevaluation statistic based on the selected test item, and revising theset of selected test items to substantially correlate the at least oneevaluation statistic with at least one specification for a test.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated in and form a part ofthe specification, illustrate preferred embodiments of the presentinvention and, together with the description serve to explain theprinciples of the invention. The embodiments illustrated in the drawingsshould not be read to constitute limiting requirements, but instead areintended to assist the reader in understanding the invention.

FIG. 1 depicts an exemplary process flow for creating a test accordingto an embodiment of the present invention.

FIG. 2 depicts an exemplary system for creating a test according to anembodiment of the present invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

The present invention relates to a method and system for constructing atest using a computer system. Specifically, the present inventionrelates to a method and system for constructing a test using a computersystem that performs specification matching during the test creationprocess.

FIG. 1 depicts an exemplary process flow for creating a test accordingto an embodiment of the present invention. First, the test developer maydetermine 105 the portion of the test item database or a particular testitem database from which to select a test item for the test. The testitem database may be a single repository containing all selectable testitems for one or more tests. Alternatively, a test developer may choosequestions from a plurality of test item databases. The test itemscontained within a portion of the test item database or within aparticular test item database may possess distinguishingcharacteristics. For example, a particular test item database maycontain test items pertaining only to questions for testing knowledge ofgeometric principles. The characteristics used to distinguish test itemswithin a portion of the test item database or a particular test itemdatabase from other test items or test item databases may correspond topsychometric specifications or content specifications for a test. Whenthe test item database or databases are organized in this manner, thetest developer may more quickly modify the set of selected test items tomatch specifications that are not satisfied during test construction.

The test developer may then view 110 a spreadsheet displayinginformation about the test items within the portion of the test itemdatabase or the particular test item database. A computer system may beused to display the spreadsheet. The test developer may examine 115 anindividual test item on a display device of the computer system prior toadding the test item to the test under development. Examining 115 anindividual test item may include viewing the test item image,information about the test item and information about related entitiessuch as text passages or artwork associated with the test item. Testitem information may include, for example, an answer key, detailedcontent specifications, reviewers' comments, scoring guidelines (forconstructed-response test items), statistical properties for the testitem, such as a percentage of correct answers received if the test itemwas previously administered, t-biserials, r-biserials, item responsetheory parameters, gender-based response statistics, race-based responsestatistics, a percentage of responses choosing each distractor, and thefrequency with which the question or a similar variant has been includedon previous test administrations.

Upon reviewing the information, the test developer may select 120individual test items for inclusion in the test. As the developerselects test items, the resulting correlation between content andpsychometric specifications for the test and the correspondingcharacteristics of the user-selected test may be updated 125. If theselected test items do not meet one or more specifications, thedeveloper may revise 130 his or her test item selection until a bestmatching between the test item properties and the content andpsychometric specifications is achieved. In an exemplary embodiment,item response theory may be used in the matching of test item propertiesto test specifications. In an alternate embodiment, the percentage ofexaminees that answer a test item correctly may be used in the matchingof test item properties to test specifications. These methods ofdetermining test item properties are merely exemplary and are not meantto be limiting. Additional methods for determining test item propertiesmay be performed singly or in combination with the above-listed methodsand are intended to be encompassed within the scope of the presentinvention without limitation.

FIG. 2 depicts an exemplary system for creating a test according to anembodiment of the present invention. The system may include a computersystem 200 containing a processor 205, a display 210 and acomputer-readable medium 215, such as a hard drive, a floppy disk, a CD,a DVD, RAM, ROM, EPROM, EEPROM or other memory or memory storage device.The computer-readable medium 215 may contain a database 220 includingpotential test items and information about test items and tests.

The information about the test items may include content structure andstatistical properties of the test items. The content structure maydenote the format of the question, the information being tested, thestyle of question, and similar content-related information. Thestatistical properties may include the percentage of examinees thatselect a particular response, the correlation between selecting aparticular response and exhibiting a particular personality trait (inthe case of behavioral or psychological testing), and the like.

The information regarding the tests may include content specificationsand psychometric specifications for tests to be constructed from thedatabase 220. The content specifications may list requirements for thetest such as requiring a certain percentage of test items to be inmultiple-choice format, to test verbal skills, or to be of a specifiedlength. Content specifications may also include, without limitation,specifying the overall test length, the number of test items presentedon a particular topic, a key distribution, the percentage of test itemswith particular characteristics, a gender or racial orientation of itemsand the language in which the test is presented. Psychometricspecifications may include, without limitation, a preferred overall testdifficulty, a correlation between a correct response for a test item anda particular cognitive or behavioral trait, the orientation of thepresentation of questions and answers for test items, meanpoint-biserials, mean r-biserials and the visual presentation of thetesting materials.

The above-listed specifications are merely representative ofspecifications and properties that may be included in the database 220.It will be evident to one of skill in the art that more or fewerproperties and specifications may be included in the database and stillbe within the scope of the invention.

The computer-readable medium 215 or a second computer-readable medium225 operably connected to the processor 205 may contain a computerprogram for implementing a systematic procedure for selecting items fora test. The program may display the extent to which a test underdevelopment matches the content and psychometric specifications as auser constructs the test. In this way, the user may replace, remove, oradd one or more test items to the set of selected test items to bestmatch those specifications in an efficient manner.

Although the invention has been described with reference to thepreferred embodiments, it will be apparent to one skilled in the artthat variations and modifications are contemplated within the spirit andscope of the invention. The drawings and description of the preferredembodiments are made by way of example rather than to limit the scope ofthe invention, and it is intended to cover within the spirit and scopeof the invention all such changes and modifications.

1. A method of constructing a test, comprising: selecting a test itemfor inclusion in a set of selected test items; updating at least oneevaluation statistic based on the selected test item; and revising theset of selected test items to substantially correlate the at least oneevaluation statistic with at least one specification for a test.
 2. Themethod of claim 1 wherein the test item is selected at least in partbased on a subject matter for the test item.
 3. The method of claim 1wherein the at least one evaluation statistic is selected from a groupincluding content specifications and psychometric specifications.
 4. Themethod of claim 1 wherein the at least one specification is selectedfrom a group including content specifications and psychometricspecifications.
 5. The method of claim 4 wherein the contentspecifications comprise one or more of the following: a number of testitems to be presented in each of one or more pre-determined formats; atotal number of test items to be included in the set of selected testitems; a number of test items for testing each of one or morepre-determined subject matters; a key distribution; a percentage of testitems having one or more pre-defined characteristics; a gender or racialorientation of test items; and a language in which the test items arepresented.
 6. The method of claim 4 wherein the psychometricspecifications comprise one or more of the following: an overall testdifficulty rating; a correlation between a correct response for aselected test item and a particular cognitive or behavioral trait; anorientation of the presentation of questions and answers for the set ofselected test items; a number of pages of text for a test; a meanpoint-biserial; a mean r-biserial; and an arrangement of the set ofselected test items.
 7. A method for constructing a test, comprising:selecting a portion of a test item database from which to select a setof test items for a test, wherein the test includes one or more testspecifications; displaying information concerning a plurality of testitems in the selected portion of the test item database; examining atest item on a display device; selecting the test item for the test; andupdating a value for at least one test specification based on specifiedproperties for the selected test item.
 8. The method of claim 7 whereinselecting a portion of a test item database comprises selecting aportion of a test item database based on a subject matter for test itemscontained within the portion of the test item database.
 9. The method ofclaim 7 wherein examining a test item comprises viewing one or more ofthe following: an image of the test item; statistical properties of thetest item; text passages associated with the test item; an answer key;detailed content specifications; reviewers' comments; scoringguidelines; and artwork associated with the test item.
 10. The method ofclaim 7 wherein the statistical properties comprise one or more of thefollowing: a percentage of correct responses for the test item;t-biserials; r-biserials; item response theory parameters; gender-basedresponse statistics; race-based response statistics; a percentage ofresponses choosing each distractor; and a frequency of previous usagefor the test item.
 11. The method of claim 7 wherein updating a valuefor at least one test specification includes using item response theory.12. The method of claim 7, further comprising: comparing current valuesfor the one or more test specifications with required values for the oneor more test specifications.
 13. The method of claim 7, furthercomprising: replacing one or more test items in the set of selected testitems based on the one or more updated specifications.
 14. The method ofclaim 7, further comprising: adding one or more test items to the set ofselected test items based on the one or more updated specifications. 15.The method of claim 7, further comprising: removing one or more testitems from the set of selected test items based on the one or moreupdated specifications.
 16. A system for constructing a test,comprising: a processor; a computer-readable medium operably connectedto the processor; and a display, wherein the computer-readable mediumcontains one or more databases each having a plurality of test items,wherein each test item comprises: a textual question and one or moreanswers for the test item, a content structure of the test item, and oneor more statistical properties for the test item.
 17. The system ofclaim 16 wherein the statistical properties comprise one or more of thefollowing: a percentage of correct responses for the test item;t-biserials; r-biserials; item response theory parameters; gender-basedresponse statistics; race-based response statistics; a percentage ofresponses choosing each distractor; and a frequency of previous usagefor the test item.
 18. The system of claim 16 wherein thecomputer-readable medium further comprises: content specifications for atest; and psychometric specifications for the test.
 19. The system ofclaim 18 wherein the processor evaluates the content specifications andpsychometric specifications for a test while the test is being created,wherein the processor determines a correlation value between theproperties of the plurality of test items for the test and the contentspecifications and psychometric specifications for the test, and whereinthe display displays the correlation value.
 20. The method of claim 16wherein the computer-readable medium further contains instructions forperforming a method of constructing a test comprising: selecting a testitem for inclusion in a set of selected test items; updating at leastone evaluation statistic based on the selected test item; and revisingthe set of selected test items to substantially correlate the at leastone evaluation statistic with at least one specification for a test.